Search CORE

6 research outputs found

Decentralized Large-Scale Natural Language Processing Using Gossip Learning

Author: Alkathiri Abdul Aziz
Publication venue
Publication date: 23/06/2020
Field of study

Decentraliserad Storskalig Naturlig Språkbehandling med Hjälp av Skvallerinlärning

Author: Alkathiri Abdul Aziz
Publication venue: KTH, Skolan för elektroteknik och datavetenskap (EECS)
Publication date: 01/01/2020
Field of study

The field of Natural Language Processing in machine learning has seen rising popularity and use in recent years. The nature of Natural Language Processing, which deals with natural human language and computers, has led to the research and development of many algorithms that produce word embeddings. One of the most widely-used of these algorithms is Word2Vec. With the abundance of data generated by users and organizations and the complexity of machine learning and deep learning models, performing training using a single machine becomes unfeasible. The advancement in distributed machine learning offers a solution to this problem. Unfortunately, due to reasons concerning data privacy and regulations, in some real-life scenarios, the data must not leave its local machine. This limitation has lead to the development of techniques and protocols that are massively-parallel and data-private. The most popular of these protocols is federated learning. However, due to its centralized nature, it still poses some security and robustness risks. Consequently, this led to the development of massively-parallel, data private, decentralized approaches, such as gossip learning. In the gossip learning protocol, every once in a while each node in the network randomly chooses a peer for information exchange, which eliminates the need for a central node. This research intends to test the viability of gossip learning for large- scale, real-world applications. In particular, it focuses on implementation and evaluation for a Natural Language Processing application using gossip learning. The results show that application of Word2Vec in a gossip learning framework is viable and yields comparable results to its non-distributed, centralized counterpart for various scenarios, with an average loss on quality of 6.904%.Fältet Naturlig Språkbehandling (Natural Language Processing eller NLP) i maskininlärning har sett en ökande popularitet och användning under de senaste åren. Naturen av Naturlig Språkbehandling, som bearbetar naturliga mänskliga språk och datorer, har lett till forskningen och utvecklingen av många algoritmer som producerar inbäddningar av ord. En av de mest använda av dessa algoritmer är Word2Vec. Med överflödet av data som genereras av användare och organisationer, komplexiteten av maskininlärning och djupa inlärningsmodeller, blir det omöjligt att utföra utbildning med hjälp av en enda maskin. Avancemangen inom distribuerad maskininlärning erbjuder en lösning på detta problem, men tyvärr får data av sekretesskäl och datareglering i vissa verkliga scenarier inte lämna sin lokala maskin. Denna begränsning har lett till utvecklingen av tekniker och protokoll som är massivt parallella och dataprivata. Det mest populära av dessa protokoll är federerad inlärning (federated learning), men på grund av sin centraliserade natur utgör det ändock vissa säkerhets- och robusthetsrisker. Följaktligen ledde detta till utvecklingen av massivt parallella, dataprivata och decentraliserade tillvägagångssätt, såsom skvallerinlärning (gossip learning). I skvallerinlärningsprotokollet väljer varje nod i nätverket slumpmässigt en like för informationsutbyte, vilket eliminerarbehovet av en central nod. Syftet med denna forskning är att testa livskraftighetenav skvallerinlärning i större omfattningens verkliga applikationer. I synnerhet fokuserar forskningen på implementering och utvärdering av en NLP-applikation genom användning av skvallerinlärning. Resultaten visar att tillämpningen av Word2Vec i en skvallerinlärnings ramverk är livskraftig och ger jämförbara resultat med dess icke-distribuerade, centraliserade motsvarighet för olika scenarier, med en genomsnittlig kvalitetsförlust av 6,904%

Publikationer från KTH

Decentraliserad Storskalig Naturlig Språkbehandling med Hjälp av Skvallerinlärning

Author: Alkathiri Abdul Aziz
Publication venue: KTH, Skolan för elektroteknik och datavetenskap (EECS)
Publication date: 01/01/2020
Field of study

Decentraliserad Storskalig Naturlig Språkbehandling med Hjälp av Skvallerinlärning

Author: Alkathiri Abdul Aziz
Publication venue: KTH, Skolan för elektroteknik och datavetenskap (EECS)
Publication date: 01/01/2020
Field of study

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Decentralized Word2Vec Using Gossip Learning

Author: Alkathiri Abdul Aziz
Giaretta Lodovico
Girdzijauskas Sarunas
Sahlgren Magnus
Publication venue: KTH, Programvaruteknik och datorsystem, SCS
Publication date: 01/01/2021
Field of study

Advanced NLP models require huge amounts of data from various domains to produce high-quality representations. It is useful then for a few large public and private organizations to join their corpora during training. However, factors such as legislation and user emphasis on data privacy may prevent centralized orchestration and data sharing among these organizations. Therefore, for this specific scenario, we investigate how gossip learning, a massively-parallel, data-private, decentralized protocol, compares to a shared-dataset solution. We find that the application of Word2Vec in a gossip learning framework is viable. Without any tuning, the results are comparable to a traditional centralized setting, with a reduction in ground-truth similarity scores as low as 4.3%. Furthermore, the results are up to 54.8% better than independent local training.QC 20210423</p

Publikationer från KTH

Decentralized Word2Vec Using Gossip Learning

Author: Alkathiri Abdul Aziz
Giaretta Lodovico
Girdzijauskas Sarunas
Sahlgren Magnus
Publication venue: KTH, Programvaruteknik och datorsystem, SCS
Publication date: 01/01/2021
Field of study

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line